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DETAILED ACTION 
Response to Arguments 

Applicant's arguments, see Remarks pages 1-4 and claim amendments, 
filed 3 March 2006, with respect to the rejection(s) of claim(s) 1-10, 13, 21-27, 
29, and 32-38 under various statutes have been fully considered and are partially 
persuasive as explained below. 

Applicant has clarified what means correspond to what function (see 
Remarks page 1 , bottom of page) with regards to claims 8-10 and 13. 

The rejection of claims 8-10 and 13 under 35 USC 112, second 
paragraph, as failing to comply with the requirements of 35 USC 112, sixth 
paragraph, stands withdrawn since applicant has met the requirements as recited 
above. 

The objection to the drawings stands withdrawn as per the above; it was 
made in conjunction wjth rejection of claims 8-10. and 13 under 35 USC 112,. 
second paragraph. 

The coupled objection to the specification also stands withdrawn for the 
same reasons as above. 

Applicant's arguments with respect to claims 1-7, 21-27, and 32-38 are 
not found to be persuasive in all regards. However, In light of applicant's 
amendments to the claims and in order to expedite prosecution, examiner has 
added a teaching reference that covers applicant's addition of intended of 
specific language to the claims. 
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The rejections of claims 1-7, 21-27, and 32-38 under 35 USC 103(a) in 
view of Wilson, Willson, and various other references stands withdrawn. 
New grounds of rejection follow below. 

Additionally, it shows that in the art of image processing, the terms 
'convolution' and 'filtering' are synonymous, and that applicant's assertion that 
FIR digital filters are not analogous field for one skilled in the art of computer 
graphics is mistaken. One skilled in computer graphics that was performing 
image convolution operations would inherently be familiar with the various 
kinds of filters used in the art, where filters are used for filtering images and any 
other digital signal representing some phenomenon - see for example edge 
detection and the like. Examiner upon request by applicant to this effect will 
provide substantial evidence; applicant is kindly requested to immediately 
communicate intent to request such material to examiner, so that it can rapidly be 
made of record. . . . . 

Allowable Subject Matter 

Claims 8-10, 13, and 39-41 are allowed for the reasons discussed in the 
previous Office Actions, since applicant has corrected the deficiencies that they 
contained and they were previously indicated allowable. 

Claims 16-17 and 29 are objected to as being dependent upon a rejected 
base claim, but would be allowable if rewritten in independent form including all 
of the limitations of the base claim and any intervening claims, as discussed in 
the last Office Action. 
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It is noted that the 'means' recited in ciaim are assumed to be the SM 
ASIC in Figure 8. as discussed on pages 9-10 of the instant specification. 

Allowable Subject Matter 

The indicated allowability of claims 14-17 is withdrawn in view of the newly 
discovered reference(s) to Bui. Rejections based on the newly cited reference(s) 
follow. 

Definitions 

The term 'rendered' is not defined by applicant's specification. The 
standard definition for 'rendered' in the context of computer graphics is that an 
image that is digitized and exists within some type of memory or storage, volatile 
or nonvolatile, is processed and sent to a display device - such as a LCD, CRT, 
and the like. 

This definition is consistent with the intrinsic record. Examiner is giving 
cjaims.their broadest reasonable interpxetatjv;n as per MPEP 2105. 

Further, the term 'convolution' in the context of digital signal and image 
processing has certain meanings. An image convolution is inherently a filtering 
operation. Further, in mathematics, the definition of a kernel (one definition) of a 
function (e.g. convolution) is: the equivalent relation on the function's domain that 
roughly expresses the idea of "equivalent as far as the function f can tell". 
However, the point is that image convolution represents a function, and that 
function has a kernel in the mathematical sense that conveys the operation of the 
function. Therefore, any convolution operation will inherently have a kernel. See 
as an examples Bui 1 :5-2:45. 
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Bui, which is substituted for Wlllson, provides a full explanation of how 
convolution is equivalent to filtering and Bui is clearly analogous art, since it 
convolves images. 

Claim Rejections - 35 USC §112 
The following is a quotation of the second paragraph of 35 U.S.C. 1 12: 

The specification shall conclude with one or more claims particularly pointing out and distinctly 
claiming the subject matter which the applicant regards as his invention. 

Claim 14 is rejected under 35 U.S.C. 1 12, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter 
which applicant regards as the invention. 

Specifically, the claim recites the number 'N' without specifying the range 
of such a number. Therefore, the claim is indefinite. Also, if N were negative, 
that would make no sense. 

Specifically, if there were only one sample manager, it would therefore 
calculate partial sums - but if N is 0, then no partial sums would ever be 
calculated for anything, and the system would not work as designed. 

Therefore, if k were zero, where N could be zero, the system could never 
work in the manner that the claim specifies, because if it were, there would be no 
processing elements or sample elements, but would only be a singular partial 
sums bus, connected to nothing. 
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Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for 
all obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention Is not identically disclosed or described 
as set forth in section 1 02 of this title, if the differences between the subject matter sought to 
be patented and the prior art are such that the subject matter as a whole would have been 
obvious at the time the invention was made to a person having ordinary skill in the art to which 
said subject matter pertains. Patentability shall not be negatived by the manner in which the 
invention was made. 

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1 , 
148 USPQ 459 (1966), that are applied for establishing a background for 
determining obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at 
issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 

Claims 1-2 are rejected under 35 U.S.C. 103(a) as being unpatentable 

over Wilson (US 5,129,092) in view of Bui et al (US 4,998,288 A). 

As to claim 1 , 

A computer graphics system for generating pixels from a distributed convolution 
of rendered samples comprising: (Wilson processes images Wilson is designed 
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to process images in 8x8 bit submatrices, as in Abstract, and 2:45-3:4, where an 
image inherently consists of pixels. Wilson performs convolution (18:20-60) on 
sections of an image broken down into smaller data pieces (1:5-20), utilizing a 
plurality of processors 10a-10n, which therefore constitutes 'distributed 
processing')(Bui teaches image convolution - 1:10-2:30, where the process of 
convolution, dividing an image into smaller kernels and performing filtering is 
explained; see 3:10-35 where a general purpose digital convolver of the present 
invention is explained) 

-A plurality of sample managers connected in series; and (Wilson clearly teaches 
in Abstract and in Figure 1 a plurality of processor groups each comprising of 
individual processing elements 10a-10h, wherein each group of elements is 
connected to the element adjacent to it using data lines 1 11-1 1n and register shift 
lines 21i-21n, as set forth in 5:55-6:34, these elements are clearly connected in 
series (as in 2:45-60, where it states that these elements are connected in a 
linear chain, where in Figure 1 it is clear that is input from data input device 20 
over lines 21a and then passed in a one-way manner down the line on lines 
21i->21n))(Bui Figure 2 teaches a plurality of sample managers, e.g. filter 
elements in rows, that constitute parallel sums, which are similar to those found 
in Wilson as above, where each element generates a result and passes it dov/n 
the chain - see the multipliers and adders, where each multiplier has a 
preassigned coefficient value - see 3:50-65 - e.g. one of the coefficients of the 
3x3 convolution kernel - e.g. the filter) 
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-A set of partial sums buses, wheiein eachi partial sums bus connects one of the 
sample managers of the series to the next sample manager in the series; (Wilson 
- the term bus is well known in the art to merely mean one or more data transfer 
lines, wherein the data lines 1 1 i-1 1 n and 21 i-21 n clearly move data in byte-size 
chunks, which therefore mean that they are "buses" in the sense meant by 
applicant. Clearly, these buses can transfer partial sums, as in 18:20-60. where 
it is noted that partial sums can be moved along the chain of processors)(Bui - 
clearly the connections between each element constitute a sample bus - the 
original data enters the pipeline and is weighted, then is subjected to a delay and 
placed on the partial sum bus and passed down the addition nodes on the 
accumulator lines) 

-Wherein each sample manager is operable to calculate partial sums for a 
corresponding portion of the rendered samples located within a convolution 
kernel corresponding to a pixel !■ cation, wherein the partial sums comprise 1 ) a 
sum of weights determined for locations of the rendered samples in the portion of 
rendered samples and 2) a sum of weighted sample values for the portion of 
rendered samples, (Wilson is designed to process images in 8x8 bit submatrices, 
as in Abstract, and 2:45-3:4. Further, in 1:10-35 Wilson teaches that inherently, 
data arrays such as images must be broken into smaller data array sizes with 
dimensions equivalent to the size of the processor array. Therefore, the recited 
'convolution kernel' in applicant's specification consists a certain N x N region, as 
noted in Remarks page 1 , of which the 8x8 sub-matrix or sub-array of Wilson 
would clearly qualify. Clearly, the resultant element is passed down the 
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processor line to be operated upon, which provides a sum of weighted values for 
that portion of samples. Further, in 18:20-60 it is clearly explained that the 
system is intended to handle convolutions and/or sums as part of processing 
images, where these can clearly be partial sums. Wilson very clearly teaches 
many common tasks in image filtering, such as transposes (16:40-50), 
accumulation (19:20-60), and the like (16:53-18:20, 1 8:60-1 9:50)(Bul - Separate 
regions of the images are convolved - 4:34-40 - every pixel in the image is 
subjected to the convolution kernel and serves as the center point in it before 
processing is completed. Bui 2:33-50 provides that video pixel elements that fall 
within the convolution kernel are firstly weighted using multipliers having 
preassigned coefficients, which are then added together. See for example 
Figure 2 - each pixel is firstly sent through a multiplier M02 where it is weighted, 
then along a sample bus 28 through a delay element to an adder A01 . This 
continues down the chain, where each weighted element is added to the bus. line 
and passed down. In other words, the coefficients assigned to the multipliers 
comprise a sum of weights for the various elements, and at the end of each 
branch, a partial sum is output. Then, that partial sum from that row is passed 
along another partial sum bus to another delay element H, where the master 
partial sum bus then adds the row output partial sums together and a final value 
is output as convolved video output 210.) 

-Wherein each of the second through the last sample manager in the series is 
operable to add the partial sums calculated for its corresponding portion of the 
rendered samples to any previously accumulated partial sums received firom the 
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prior sample manager in the series, and if not the last sample manager in the 
series, output new accumulated partial sums to the next sample manager in the 
series. (Wilson clearly teaches that for the accumulator model, each group of 
processing elements passes the partial sums along towards the right. Clearly, 
the system will take the partial sums calculated for its portion of the same sample 
and output the results to the next group of processing elements in the series)(Bui 
Figures 2-4 clearly show that the partial sums produced by any element that is 
not the first in a chain is passed down the partial sums bus, with the output of 
that particular element added to the result already in the partial sums bus, and 
that result is passed down the chain. The last element in the chain outputs the 
convolved video output). 

Wilson per se teaches most of the limitations of the above claim, but does 
not expressly teach certain details of how a chain of linear accumulation 
elements for an image-processing filter would operate, csnd expressly stating how 
image convolution works, along with explaining the partial sums weighting 
process more completely. 

The Bui reference is added to explicitly cover certain details of how a 
chain of linearly connected accumulation elements for an image-processing filter 
would operate, with respect to how various elements of a sequence of 
processing elements would communicate with regards to partial sums 
transmission, and that the results are passed down the partial sums busses and 
added together. It would have been obvious to one of ordinary skill in the art at 
the time the invention was made to combine the systems of Wilson and Bui 



Application/Control Number: 10/673,087 Page 1 1 

Art Unit: 2628 

because the system of Bui induces less delays and can operate in a faster 
manner for purposes of convolution of specific elements - see for example (5:1- 
20) that avoids the known larger delays in the prior art, such as 4:46-57. Where 
clearly the system of Wilson is 2:45-60 is specified to perform image processing 
and convolution, as well as accumulation operations, the Wilson reference is 
silent on how that actually implements image filtering. The Bui reference clearly 
performs image filtering using weighted partial sums buses as defined above, 
where in 1 :63-2:2 it is specified how accumulation processes are used in image 
convolution. 

As to claim 2, clearly Wilson is computing image data as noted above, so 
the last sample manager would clearly calculate pixel values, where the output 
data is clearly computed by the final element and sent out from the final unit 
using data line 21p to output device 22, which clearly shows that such data is 
image data, and the like. 

Bui clearly performs image convolution at the last output manager, adds 
all the partial sums together, and generates the convolved video output (210, 
310, 410- Figures 2, 3, and 4). 

As to claim 14, 

A system for distributed filtering of samples comprising: (Wilson processes 
images Wilson is designed to process images in 8x8 bit submatrices, as in 
Abstract, and 2:45-3:4, where an image inherently consists of pixels. Wilson 
performs convolution (18:20-60) on sections of an image broken down into 
smaller data pieces (1:5-20), utilizing a plurality of processors lOa-IOn, which 
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therefore constitutes 'distributed processing')(Bui teaches image convolution - 
1 :1 0-2:30, where the process of convolution, dividing an image into smaller 
kernels and performing filtering is explained; see 3:10-35 where a general 
purpose digital convolver of the present invention is explained. Cleariy, image 
pixels constitute samples.) 

-A series of N sample managers (k), wherein k is an integer with range 0 to N-1 ; 
(Wilson clearly teaches in Abstract and in Figure 1 a plurality of processor groups 
each comprising of individual processing elements lOa-IOh, wherein each group 
of elements is connected to the element adjacent to it using data lines 1 1 i-1 1 n 
and register shift lines 21 i-21 n, as set forth in 5:55-6:34, these elements are 
clearly connected in series (as in 2:45-60, where it states that these elements are 
connected in a linear chain, where in Figure 1 it is clear that is input from data 
input device 20 over lines 21a and then passed in a one-way manner down the 
line on lines 21i^21n))(Bui Figure 2 teaches a plurality of sample managers, ^g: 
filter elements in rows, that constitute parallel sums, which are similar to those 
found in Wilson as above, where each element generates a result and passes it 
down the chain - see the multipliers and adders, where each multiplier has a 
preassigned coefficient value - see 3:50-65 - e.g. one of the coefficients of the 
3x3 convolution kernel - e.g. the filter) 

-A partial sums bus connecting each sample manager (k) in the series of sample 
managers to the next sample manager (k+1); (Wilson - the term bus is well 
known in the art to merely mean one or more data transfer lines, wherein the 
data lines 1 1 i-1 1 n and 21 i-21 n clearly move data in byte-size chunks, which 
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therefore mean that they are "buses" in the sense meant by applicant. Clearly, 
these buses can transfer partial sums, as in 18:20-60, where it is noted that 
partial sums can be moved along the chain of processors)(Bui - clearly the 
connections between each element constitute a sample bus - the original data 
enters the pipeline and is weighted, then is subjected to a delay and placed on 
the partial sum bus and passed down the addition nodes on the accumulator 
lines) 

-Receive accumulated partial sums from a prior sample manager (k-1), if k is 
greater than zero, wherein partial sums comprise partial sums for each 
parameter value, (Wilson is designed to process images in 8x8 bit submatrices, 
as in Abstract, and 2:45-3:4. Further, in 1:10-35 Wilson teaches that inherently, 
data arrays such as images must be broken into smaller data array sizes with 
dimensions equivalent to the size of the processor array. Therefore, the recited 
'convolution kernel' in applicant'^ specification consists a certairi N x N region, as 
noted in Remarks page 1, of which the 8x8 sub-matrix or sub-array of Wilson 
would clearly qualify. Clearly, the resultant element is passed down the 
processor line to be operated upon, which provides a sum of weighted values for 
that portion of samples. Further, in 18:20-60 it is clearly explained that the 
system is intended to handle convolutions and/or sums as part of processing 
images, where these can clearly be partial sums. Wilson very clearly teaches 
many common tasks in image filtering, such as transposes (16:40-50), 
accumulation (19:20-60), and the like (16:53-18:20, 1 8:60-1 9:50)(Bui - Separate 
regions of the images are convolved - 4:34-40 - every pixel in the image is 
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subjected to the convolution kernel and serves as the center point in it before 
processing is completed. Bui 2:33-50 provides that video pixel elements that fall 
within the convolution kernel are firstly weighted using multipliers having 
preassigned coefficients, which are then added together. See for example 
Figure 2 - each pixel is firstly sent through a multiplier M02 where it is weighted, 
then along a sample bus 2B through a delay element to an adder A01 . This 
continues down the chain, where each weighted element is added to the bus line 
and passed down. In other words, the coefficients assigned to the multipliers 
comprise a sum of weights for the various elements, and at the end of each 
branch, a partial sum is output. Then, that partial sum from that row is passed 
along another partial sum bus to another delay element H, where the master 
partial sum bus then adds the row output partial sums together and a final value 
is output as convolved video output 210.) 

-Calculate partial sums for a set of samples, wherein the set of samples are 
within a sub-set of screen space assigned to sample manager (k), and wherein 
the set of samples are located within a convolution kernel defined for a pixel, 
(Wilson, as discussed above, divides the screen into 8x8 submatrices for 
processing, which constitute a sub-set of screen space, which would be assigned 
to each block, as discussed above also, where the block would constitute a 
sample manager (k). Clearly, these are used for accumulation purposes)(Bui 
clearty teaches that a convolution kernel can be any width - the implementation 
provided of a 3x3 array is merely an example (3:10-45). As noted above, the 
convolution kernel is passed over each pixel in the image (or subset of an image) 
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(4:34-40). Therefor e, the subset of screen space wouid be the 8x8 block of 
Wilson, and the convolution kernel provided by Bui could be applied, since the 
set of samples is not required to be the same size as the set of screen space, 
although for purposes of convenience, it could be) 

-Add the partial sums to the sample manager (k+1 ), if k is less than N-1 ; and 
(Plainly meaning that if sample manage (k) is the last one in the series (e.g. k = N 
-1 ), no partial sum addition would take place, since there would be no more 
sample managers to send the partial sums to - the addition process would be 
complete)(Wilson clearly teaches that for the accumulator model, each group of 
processing elements passes the partial sums along towards the right. Clearly, 
the system will take the partial sums calculated for its portion of the same sample 
and output the results to the next group of processing elements in the series)(Bui 
Figures 2-4 clearly show that the partial sums produced by any element that is 
not the first in a chain is passed down the partial sums bus, with the output of 
that particular element added to the result already in the partial sums bus, and 
that result is passed down the chain. The last element in the chain outputs the 
convolved video output). 

-\A/herein a designated sample manager is operable to calculate pixel values 
from the final accumulated partial sums (Wilson displays the results of the 
computations on an output display device, as shown as data output device 22 in 
Figure 1 )(Bui clearly shows in Figures 2-4 that the output of the device is pixel 
values in video format, and that the output is 'convolved video' as element 210 in 
Figure 2, for example). 
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Wilson per se teaches most of the iirnitations of the above claim, but dues 
not expressly teach certain details of how a chain of linear accumulation 
elements for an image-processing filter would operate, and expressly stating how 
image convolution works, along with explaining the partial sums weighting 
process more completely. 

The Bui reference is added to explicitly cover certain details of how a 
chain of linearly connected accumulation elements for an image-processing filter 
would operate, with respect to how various elements of a sequence of 
processing elements would communicate with regards to partial sums 
transmission, and that the results are passed down the partial sums busses and 
added together. It would have been obvious to one of ordinary skill in the art at 
the time the invention was made to combine the systems of Wilson and Bui 
because the system of Bui induces less delays and can operate in a faster 
manoer for purposes of convolution of specific elernents - see for example { :: 1>- 
20) that avoids the known larger delays in the prior art, such as 4:46-57. Where 
clearly the system of Wilson is 2:45-60 is specified to perform image processing 
and convolution, as well as accumulation operations, the Wilson reference is 
silent on how that actually implements image filtering. The Bui reference clearly 
performs image filtering using weighted partial sums buses as defined above, 
where in 1 :63-2:2 it is specified how accumulation processes are used in image 
convolution. 
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As to claim 15, clearly each group of elements in Wilson has the 
corresponding memory across to each group of processing elements in series - 
see Figure 1 as an example. 

Claims 3, 21 , 23, 25-27, 30, 32-33, and 36-37 are rejected under 35 
U.S.C. 103(a) as being unpatentable over Wilson in view of Bui as applied to 
claims 1 and 2 above, and further in view of Inada et al (US 2004/0004620 
A1)('lnada'). 

As to claim 3, the system of claim 2, wherein for each sample manager 
the conresponding portion of samples resides in a sub-set of saeen space and 
the sub-sets are finely interleaved across screen space. ~l ~ Clearly the system 
of Inada establishes in [0154] that the system breaks the screen down Into blocks 
of 4x4 pixels for interleaving, which constitutes a distribution across screen 
space, and further in Fig. 1 it is shown ho'^ the screen is divided into smaller > 
areas, where each area is analyzed for the presence of a primitive in the pixels in 
that particularly, smaller area. Further, Wilson teaches the division of an image 
into blocks or sub-arrays for processing and convolution purposes. That being 
said. It would have been obvious to one having ordinary skill in the art at the time 
the invention was made to combine the systems of Wilson and Bui for the 
reasons set forth above (the motivation and combination of claim 2 are herein 
incorporated by reference) with the system of Inada, to allow interleaving as that 
technique speeds up drawing time (Inada [0155]). 
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As to claim 21 , the rejection to claim 1 is incorporated by reference. 
Firstly, the claimed filter unit as recited is clearly comparable to the sample 
managers recited in previous claims, as the functionality is the same, and the 
processing element would clearly be performing similar tasks. Each of the N 
memories recited is attached to a group of processing elements, which serves as 
a filtering element / sample manager / generic processor, as set forth in Wilson 
Fig. 1. Each unit of Willson could contain a convolution kernel, with the individual 
taps of the filter being the multipliers of Bui and the like. Clearly the system of 
Inada establishes in [0154] that the system breaks the screen down into blocks of 
4x4 pixels for interleaving, which constitutes a distribution across screen space, 
and further in Fig. 1 it is shown how the screen is divided into smaller areas, 
where each area is analyzed for the presence of a primitive in the pixels in that 
particularly, smaller area, and in Fig. 7 the saeen is shown to be divided into 2x2 
bins or tiles containing samples for processing purpose*. Turther, Wilson teaches 
the division of an image into smaller arrays, as in Wilson is designed to process 
images in 8x8 bit submatrices, as in Abstract, and 2:45-3:4. Further, in 1 :10-35 
Wilson teaches that inherently, data arrays such as images must be broken into 
smaller data array sizes with dimensions equivalent to the size of the processor 
array. 

Next, the recited N and k can clearly be 1 . Obviously, the system of 
Wilson has one or more memory per processing group as connected in Figure 1 
- see Wilson 5:55-6:45 and shown and the system of Inada has a graphics 
memory 145 with a memory interface circuit 144 as shown in Fig. 2, where 
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clearly this meets the recited limitations for the fact that N and k can be 1 . Each 
unit of Bui is an individual tap of the filter or similar implementation. The recited 
numeric limitations - that of N and k, are obvious in that any chain of processors 
would have each processor numbered as set forth in the claim, with the 
respective limitations, in that, for example, the first processor in the chain would 
be numbered zero, and of course the first processor would prima facie not 
receive data from a previous processor as it was the first one in the processor 
chain. 

As set forth in the preceding paragraph, each processor 10a-10h in the 
processing group in Wilson obviously reads from its own memory that contains 
the section of the image assigned to (as in Inada) for convolution purposes (or 
Wilson), performs partial sum calculations on it, and moves it into the next 
element in the linearly connected array (Wilson). The entire question of partial 
sums and their calculations is covered in the sections of the rejectionc of claim 1 ^ 
that has been expressly incorporated via reference and will not be repeated for 
the purposes of brevity. 

One additional note is that the system of Inada as shown in Fig. 4 clearly 
shows a plethora of operations units attached to each register (e.g. operation 
units 1411-1, 1412-1, etc. attached to registers such as 1411-2), which clearly 
establishes multiple processing / operations units attached to memories in the 
first place. 

Obviously, the system of Inada outputs pixels as set forth in paragraphs 
[0022-0024]. Clearly, the results of all the graphics calculations and convolutions 
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would be passed out as pixel data as shown there, and it is logical that the end 
results of an image convolution calculation from a chain would indeed be output 
as pixels - indeed, an image is fundamentally composed of pixels, and it is a 
fundamental of the digital signal processing art that an image is output in pixels 
from being processed in this context. 

Motivation and combination is taken from the rejection to claim 1 , which is 
incorporated by reference, and from the additional logic as set forth above. 
Inada brings in the benefits of explaining how the screen space is subdivided so 
that such an array can thusly more efficiently process all the information provided 
from the subdivision of the saeen space. Motivation / rationale is also taken 
from the rejection to claim above. 

As to claim 23, it is a substantial duplicate of claim 21 under the 
circumstances where M is 1 . For other circumstances, obviously the system of 
Wilson can be dynamically reconfigured to support the desired arrangement of - 
processing elements, depending on numbers of processors per group, as it could 
be composed of FPGAs that are fundamentally reprogrammable and/or could 
simply be programmed in a different manner, given that each processing group 
has so many processing elements. Therefore, division into groups is a trivially 
obvious variant. Also, Inada is only included for division of screen space, which 
examiner contends would be a notoriously and trivially obvious variant anyway. 

As to claim 25, it is merely a method implementing the system of claim 21 , 
and the rejection to claim 21 is valid upon it without further comment. 
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As to claim 26, see the i ejection to claim 3 above, which addresses 
regions having defined boundaries, wherein the screen space is divided asset 
forth there. Inada [0020] discusses how each region is judged with respect to its 
center point, which clearly establishes that this is an obvious variation. 
Motivation and combination are taken from the parent claim and incorporated 
herein by reference in their entirety. 

As to claim 27, this limitation is expressly covered in the rejection of claim 
1 , the relevant portion of which is incorporated by reference, and is also stated 
below. Wilson is designed to process images in 8x8 bit submatrices, as in 
Abstract, and 2:45-3:4. Further, in 1 :10-35 Wilson teaches that inherently, data 
arrays such as images must be broken into smaller data array sizes with 
dimensions equivalent to the size of the processor array. Therefore, the recited 
'convolution kernel' in applicant's specification consists a certain N x N region, as 
noted in Remarks page 1 , of which the 8x8 sub-matrix or sub-array of Wilson 
would clearly qualify. Clearly, the resultant element is passed down the 
processor line to be operated upon, which provides a sum of weighted values for 
that portion of samples. Further, in 18:20-60 it is clearly explained that the 
system is intended to handle convolutions and/or sums as part of processing 
images, where these can clearly be partial sums. Wilson very clearly teaches 
many common tasks in image filtering, such as transposes (16:40-50), 
accumulation (19:20-60), and the like (16:53-18:20, 1 8:60-1 9:50).)(Bui clearly 
teaches that the data is passed down the adder / processing element chain and 
that it consists of partial sums as data in Figure 3 and 5:65-6:20. Bui is a filter. 
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where this kind of filter inherently consists or weight functions that are 'partial 
sums', where each tap in such a filter very clearly causes a partial sum as the 
outcome, see for example the weights on the multipliers in Figures 1-4. Bui 
performs accumulation, as discussed in 1:65-2:10. 

As to claim 30, this is an obvious variation and is addressed in the 
rejection to claim 21 and is repeated herein. Obviously, the system of Inada 
outputs pixels as set forth in paragraphs [0022-0024]. Clearly, the results of all 
the graphics calculations and convolutions would be passed out as pixel data as 
shown there, and it is logical that the end results of an image convolution 
calculation from a chain would indeed be output as pixels - indeed, an image is 
fundamentally composed of pixels, and it is a fundamental of the digital signal 
processing art that an image is output in pixels from being processed in this 
context. Motivation and combination are incorporated by reference from the 
parent claim. 

As to claim 32, Inada clearly addresses this limitation wherein it would be 
obvious to divide the screen up into bins and assign each one to an FPGA or 
processing element or filtering element, whatever the generic terminology for the 
groups of Wilson or the elements of Bui. 

As to claim 33. Clearly the system of Inada establishes in [0154] that the 
system breaks the screen down into blocks of 4x4 pixels for interleaving, which 
constitutes a distribution across screen space, and further in Fig. 1 it is shown 
how the screen is divided into smaller areas, where each area is analyzed for the 
presence of a primitive in the pixels in that particularly, smaller area. Further, 
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Wilson teaches the division of an image into blocks for processing and 
convolution purposes. That being said, It would have been obvious to one 
having ordinary skill in the art at the time the invention was made to combine the 
systems of Wilson and Bui for the reasons set forth above (the motivation and 
combination of claim 2 are herein incorporated by reference) with the system of 
Inada, to allow interleaving as that technique speeds up drawing time (Inada 
[0155]). 

As to claim 36, this is a trivially obvious variant of claim 30 is subject to the 
same rejection. 

As to claim 37, this is a trivially obvious variant of claim 33 and is subject 
to the same rejection. 

Claims 4 and 34 are rejected under 35 U.S.C. 103(a) as unpatentable over 
^yVilson, Bui, and Inada as applied to claim 3 above, a« id further in view of Hsieh 
etal (US 6,819,321 B1). 

As to claim 4, Wilson and Bui do not expressly teach these limitations; 
Inada clearly teaches dividing the screen into a plurality of bins but does not 
specifically teach sixteen sample managers and a four by four array of bins. 
Reference Hsieh et al teaches dividing the screen into a number of bins for two- 
dimensional image processing, where the number can be arbitrary, but where an 
example given is four bins in Figure 4 (3:28-40). Applicant has not established 
any criticality to the number of sample managers and/or bins, and as such the 
choice of an arbitrary number of 'sample managers* or processing elements and 
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the division of the screen into some arbitrary number of bins is a matter of design 
choice, see In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960). It would 
have been obvious to one of ordinary skill in the art at the time the invention was 
made to modify Inada, Wilson, and Bui to use arbitrarily scaled bins as per Hsieh, 
since Hsieh decreases memory bandwidth required and provides numerous 
other benefits (see 2:20-40 and Abstract). 

As to claim 34, it is identical to claim 4, and the rejection to which is 
incorporated by reference. 

Claims 5, 7, and 37 are rejected under 35 U.S.C. 103(a) as unpatentable 
in view of Wilson and Bui as applied to claim 1 , and further in view of Cloutier. 

As to claim 5, Wilson and Bui do not expressly teach this limitation. 
Cloutier clearly teaches a plurality of groups of FPGAs in Figure 1 , with these 
controllable by the SIMD process controller. Cloutier Fig. 1 c\ear\/ illustrates a 
plurality of FPGAs configured in a matrix connection, all with global bus 
connections, and Fig. 3 illustrates similar connections between PEs on one 
FPGA. It is notoriously well known that an FPGA can be configured to emulate 
any other type of processor, e.g. the system of Wilson/Bui as noted above. 
Cloutier teaches that such a system works quickly and is more efficient for 
processing images and the like. Cloutier clearly establishes that as set forth 
above that each PE performs convolution based on weighted partial sum 
operations. Furthermore, the nature of convolution is such that once partial sums 
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are computed, they must be acted on by oiher elements or processors to 
produce the final, desired results. 

As such, examiner takes the position that explanations cited above in 
response to each element of the portion of the claim dealing with the computation 
and/or calculation of partials sums more than adequately meet all of the 
limitations set forth by that section of the claim. Further, any PE that received 
partials sums from the bus would clearly add them using the multiply-accumulate 
operations cited above, particularly in the case of a neural network that was 
being used to perform convolution, which would be obvious to do since the 
system of Cloutier clearly has established utility for performing both tasks, and 
optical character recognition (OCR), which requires convolution and pre- 
processing. Further, Cloutier clearly teaches the applicability of his system to 
image processing in 4:20-35. Cloutier 7:26-55 again, where it is well known that 
the partial sums r-jst be added, and 4:20-35, where it is taught that the pre:^-3rVt' 
embodiment is well suited for matrix and vector addition and multiplication. More 
specifically, the embodiment of Cloutier is taught to perform multiply-accumulate 
operations (8:50-9:15), which clearly requires that the network of processing 
elements perform multiply-accumulate operations per tile (with each processing 
element performing said operations), and in a neural network application, which 
provides feed-fonA^ard information (e.g. feedback) for pattern recognition and 
similar, multiply-accumulate operations used in the processing of an image would 
obviously be added and passed along, as the architecture of Cloutier as shown in 
Fig. 1 is such that data is passed along between elements in the additive fashion 
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as set forth above. Also, the system pei-forms convoiulion and uses partial sums, 
which clearly requires that the accumulated partial sums be passed to other 
elements. Cloutier clearly establishes in 7:26-55 that the system has many 
processing element, each of which computes its own partial sums for convolution 
purposes. 

In light of all of the above, it would have been obvious to one of ordinary 
skill in the art at the time the invention was made to combine the system of 
Wilson/Bui with that of Cloutier, such that each FPGA emulated the system of 
Wilson/Bui and therefore allowed each section to handle one portion of screen 
space, as this parallel computation would inherently be faster since the parts 
would be duplicated and the net speed would increase. This is notoriously well 
known in the art. 

As to claim 7, this is a duplicate of claim 2, the rejection to which is 
incorporated by reference. 

As to claim 38, this is a duplicate of claim 5, the rejection to which is 
incorporated by reference in addition to the rejection to claim 37 above. 

Claim 6 is rejected under 35 U.S.C. 103(a) as unpatentable over 
Wilson/Bui in view of Cloutier as applied to claim 5 above, and further in view of 
Hsieh. 

Therefore, the rejection of claims 2 and 5 are incorporated by reference, 
and motivation and rationale are taken from each of them. 
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As to claim 2, clearly all of the processing elements in Fig. 1 and Fig. 3 of 
Cloutier are clearly connected via the global bus anyway, which clearly meets the 
requirements that all the sample manager be chained, and that the final member 
calculates pixel values - dearly each chip is computing pixel values from the 
results of the prior one - see Cloutier 7:7-67. Further, as explained above in the 
rejection to claim 1 , the FPGAs and the PEs within each FPGA are all 
interconnected, and the PEs and the FPGAs can clearly be connected in a chain 
or serial fashion, as the very nature of an FPGA is that the blocks can be set to 
have any desired set of connections with bidirectional or unidirectional 
communications. 

Reference Hsieh et al teaches dividing the screen into a number of bins 
for two-dimensional image processing, where the number can be arbitrary, but 
where an example given is four bins In Figure 4 (3:28-40). Applicant has not 
established any criticality to the number c - .?ample managers and/or bins, and as 
such the choice of an arbitrary number of 'sample managers' or processing 
elements and the division of the screen into some arbitrary number of bins is a 
matter of design choice, see In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 
1 960). It would have been obvious to one of ordinary skill in the art at the time 
the invention was made to modify Inada, Wilson, and Bui to use arbitrarily scaled 
bins as per Hsieh, since Hsieh decreases memory bandwidth required and 
provides numerous other benefits (see 2:20-40 and Abstract). 

References Wilson and Bui do not expressly teach this limitation, whilst 
Reference Cloutier teaches in 7:28-40 specifically that a matrix of 8x4 PEs is 
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implemented on each FPGAs where there is a 2x2 ai ray of FPGAs in the firbi 
place, and Inada provides additional support. Given this, it would be reasonable 
to use a 4x4 array instead of an 8x4 array, given that each FPGA could easily be 
partitioned into a 4x4 array, as illustrated in Figure 3 anyway. Now, as set forth 
in the rejections to claims 1-3 above, the system of Cloutier is taught for use with 
image convolution, and further in Inada the use of interlaced scans is taught, 
such that the screen is divided up into units of say 4x4 pixels for faster drawing 
time. Therefore, if one FPGAs with a 4x4 an^ay of PEs, or a 2x2 array of FPGAs 
with a 2x2 PE implementation, with each one dedicated processing a certain 
portion of the screen was used, and interleaving was used for the results, it 
would logical to use the claimed 4x4 architecture. Motivation and combination is 
taken from the parent claim and herein incorporated by reference, with additional 
motivation as set forth in the immediately preceding paragraph. 

Claim 22 is rejected under 35 U.S.C. 103(a) as unpatentable over Wilson, 
Bui, and Inada as applied to daim 21 above, and further in view of Hsieh et al 
(US 6,819,321 B1). 

As to claim 22, references Wilson and Bui do not teach this limitation; 
Inada clearly teaches dividing the screen into a plurality of bins but does not 
specifically teach sixteen sample managers and a four by four array of bins. 
Reference Hsieh et al teaches dividing the screen into a number of bins for two- 
dimensional image processing, where the number can be arbitrary, but where an 
example given is four bins in Figure 4 (3:28-40). Applicant has not established 
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any criilcality to the number of sample managers and/or bins, ano as such the 
choice of an arbitrary number of 'sample managers' or processing elements and 
the division of the screen into some arbitrary number of bins is a matter of design 
choice, see In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 1960). It would 
have been obvious to one of ordinary skill in the art at the time the invention was 
made to modify Inada, Wilson, and Bui to use arbitrarily scaled bins as per Hsieh, 
since Hsieh decreases memory bandwidth required and provides numerous 
other benefits (see 2:20-40 and Abstract). 

Claims 23 is rejected under 35 U.S.C. 103(a) as unpatentable over 
Wilson, Bui, Inada, and Cloutier. 

As to claim 23, this is a duplicate of claim 21 , which is incorporated by 
reference, with the additional limitations from claim 5 above, the rejection to 
v/hlch Is also incorporated by reference. Wilson uses groups of ' ight processors 
that can be configured into different exemplary configurations anyway, and 
Cloutier is dynamically reconfigurable as pointed out above. Also, Inada is only 
included for division of screen space, which examiner contends would be a 
notoriously and trivially obvious variant anyway. 

Claim 24 is rejected under 35 U.S.C. 103(a) as unpatentable over Wilson, 
Bui, Cloutier, and Inada as applied to claim 3 above, and further in view of Hsieh 
et al (US 6,819,321 B1). 
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As to clairri 24, inada clearly teachibs dividing the screen into a pluralily of 
bins but does not specifically teach sixteen sample managers and a four by four 
array of bins. Reference Hsieh et al teaches dividing the screen into a number of 
bins for two-dimensional image processing, where the number can be arbitrary, 
but where an example given is four bins in Figure 4 (3:28-40). Applicant has not 
established any criticality to the number of sample managers and/or bins, and as 
such the choice of an arbitrary number of 'sample managers' or processing 
elements and the division of the screen into some arbitrary number of bins is a 
matter of design choice, see In re Harza, 274 F.2d 669, 124 USPQ 378 (CCPA 
1 960). It would have been obvious to one of ordinary skill in the art at the time 
the invention was made to modify Inada, Wilson, Bui, and Cloutier to use 
arbitrarily scaled bins as per Hsieh, since Hsieh decreases memory bandwidth 
required and provides numerous other benefits (see 2:20-40 and Abstract). 



Conclusion 

The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

US 6,504,959 to Lee, which discusses cascaded filter elements, which 
applicant should closely examine. 

US 4,984,286 to Dolazza, which is a convolution filter. 

Any inquiry concerning this communication or earlier communications from 
the examiner should be directed to Eric Woods whose telephone number is 571- 
272-7775. The examiner can normally be reached on M-F 7:30-5:00. 
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If attempts to reach a-.e examiner by telephone are unsuccessful, tiie 
examiner's supervisor, Ulka Chauhan can be reached on 571-272-7782. The fax 
phone number for the organization where this application or proceeding is 
assigned is 571-273-8300. 

Infomiation regarding the status of an application may be obtained from 
the Patent Application Information Retrieval (PAIR) system. Status information 
for published applications may be obtained from either Private PAIR or Public 
PAIR. Status information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, see http://pair- 
direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- 
free). If you would like assistance from a USPTO Customer Service 
Representative or access to the automated information system, call 800-786- 
9199 (IN USA OR CANADA) or 671-272-1000. . ^ 
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